Conventional format conversion systems include optical character recognition (OCR) systems and systems such as those used by GOOGLE that convert document data, but not image data, in PDF format into HTML (Hyper Text Markup Language) data for display by a web browser. There exists, however, a need for a system that can input a document image and automatically read the imaged text.